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Abstract 

The purpose of this article is to comment on the prior article entitled “Examining 
Instruction, Achievement and Equity with NAEP mathematics data,” by Sarah 
Theule Lubienski. That article claims that a prior article by the author suffered 
from three weaknesses: (1) An attempt to justify No Child Left Behind (NCLB); (2) 
drawing causal inferences from cross-sectional data; (3) and various statistical 
quibbles. The author responds to the first claim, by indicating that any mention of 
NCLB was intended purely to make the article relevant to a policy journal; to the 
second claim, by noting his own reservations about using cross-sectional data to 
draw causal inferences; and to the third claim by noting potential issues of 
quantitative methodology in the Lubienski article. He concludes that studies that 
use advanced statistical methods are often so opaque as to be difficult to compare, 
and suggests some advantages to the quantitative transparency that comes from the 
findings of randomly controlled field trials. 
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Resumen 

El objetivo de este articulo es comentar el trabajo “Examinando instruccion, 
logros, y equidad usando resultados de matematicas de NAEP” de Sarah Theule 
Lubienski previamente publicado en EPAA. Ese articulo sostenia que otro articulo 
publicado por mi sufria de tres debilidades: (1) Intentaba justificar la ley federal sin 
abandonar ningun nino (2) Realizaba inferencias causales a partir de datos 
obtenidos en estudios cross seccionales; (3) otras objeciones sobre las estadisticas 
usadas. Este autor responde a la primera objecion indicando que cualquier mencion 
a la ley federal sin abandonar ningun nino tuvo como objetivo establecer la 
relevancia de este trabajo para una revista de analisis politico; a la segunda 
objecion, respondo indicando mis propias reservas al uso de estudios cross 
seccionales para establecer relaciones causales; y a la tercer objecion, indicando 
algunos problemas potenciales en la metodologia cuantitativa del trabajo de 
Lubienski. Este autor concluye que los estudios que usan metodos cuantitativos 
avanzados suelen ser opacos y dificiles de entender y comparar, y sugiere que la 
transparencia cuantitativa deviene obtener resultados en experimentos con 
poblaciones seleccionadas al azar y campos controlados. 

Editor’s Note: This article is a response to Sarah Euhienski’s (2006) article that appears at 
http:/ / epaa.asu.edu/ epaa/v!4nl4/, which discussed Wenglinsky’s (2004) article available at 
http:/ /epaa.asu.edu/vl2n64/ . It is the practice of Education Policy Analysis Archives to publish one 
round of responses to articles where it is merited. Additional discussion of this and other articles is welcome 
online at http:/ /epaa.info/wordpress . 


Over the last decade, the author has published half a dozen studies of relationships among 
school and teacher characteristics and student achievement using data from the National 
Assessment for Educational Progress (NAEP). Otherwise known as “the Nation’s Report Card,” 
NAEP provides test data on nationally representative samples of fourth, eighth and twelfth graders 
in a variety of subjects over multiple years. There are many methodological challenges faced in the 
analysis of NAEP data, but none as nettlesome as its cross-sectional nature. Because the data are 
cross-sectional, the finding of relationships between school characteristics, such as class size, and 
student achievement cannot be used to draw causal inferences. This point has been made by the 
author in nearly all of his publications on the topic, and is reiterated by opponents of the author’s 
conclusions. If the policy conclusions are of a constructivist nature, the author finds himself 
attacked by conservative-learning researchers on the grounds that he is making causal inferences. If 
the policy conclusions are of a didactic nature, the author finds himself attacked by liberal-leaning 
researchers, generally on the same grounds. The critics usually also sprinkle in some methodological 
quibbles, such as wondering what the results would have been if variable X had been measured 
slightly differently, but the core criticism is that the cross-sectionality of the data make causal 
inferences impossible. 

A recent instance of this occurred with Sarah Theule Lubienski’s (2006) response to the 
current author’s article on the achievement gap, “Closing the Racial Achievement Gap: The Role of 
Reforming Instructional Practices” (Wenglinsky, 2004). The current author’s article distinguished 
between two types of racial achievement gap, that “between schools,” meaning between 
predominantly minority and predominantly white schools, and “within schools,” meaning between 
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White and minority students in the same school. The author found that a variety of instructional 
practices, mostly of the constructivist variety, but some not, were negatively related to the within- 
school achievement gap, but unrelated to the between-school achievement gap. The author 
concluded that using the identified practices might be a viable strategy for reducing the achievement 
gap within schools, but not the one between schools, which would require some more macro- 
institutional change. Lubienski’s (2006) response, “Examining Instruction, Achievement and Equity 
with NAEP Mathematics Data,” also found some instructional practices to be associated with the 
racial achievement gap, but with the caveats that only constructivist techniques evinced a 
relationship and that these techniques did not go so far as eliminating achievement gaps (the gaps 
she examined were analogous to the within-school gaps the author examined). 

Lubienski’s article raised the three questions that the current author finds are commonly 
raised about NAEP secondary analyses. First, Lubienski suggested an ideological underpinning to 
her critique: The current author’s study was supposed to suggest how the Bush Administration’s No 
Child Left Behind (NCLB) could succeed, whereas her study focused on empirical support for the 
National Council of Teachers of Mathematics’ “reform-minded” (read constructivist) instructional 
practices. Second, Lubienski raised the issue of causal inferences, claiming that the current author 
used causal language in his study. And, third, she proposed some statistical quibbles which 
amounted to the notion that she approached her analysis slightly differently than the current author 
did. 

With regard to the first argument, the current author had no ideological agenda. The 
mention of NCLB did not constitute and endorsement of it, but simply an attempt to find some 
relevance to policymakers in an article submitted to a policy journal. It has typically been the 
author’s experience that the findings of NAEP secondary analyses rarely fit neady into one 
ideological framework or another. Thus the author’s study of school finance found that the 
effectiveness of school dollars depended upon how they were spent (Wenglinsky, 1997). As another 
example, the author’s foray into the debate about whether educational technology made a difference 
found that technology effects depended upon how the technology was used (Wenglinsky, 2005). 
Rarely are the findings from statistical analyses of large-scale data unequivocal, and “Closing the 
Racial Achievement Gap” was no different, finding that the effective practices were an ideological 
potpourri, leaning somewhat towards the constructivist side. 

The second argument, about causal inferences, is to some extent a red herring. As Lubienski 
admits, the current author acknowledged repeatedly that causal inferences cannot be drawn from 
cross-sectional data. He specifically noted that while he would use the phrase “school effect,” he 
meant it in the statistical sense (as in “effect size”) and did not intend it to connote causality. 
Lubienski herself sometimes falls into causal language, such as when she refers to instructional 
practices as “predictors” of achievement, suggesting that they are temporally, and thus causally, prior 
to test scores (Lubienski, 2006, p. 7). And in another analysis of NAEP data, in which Lubienski 
seeks to measure the relationship between attending a private school and student achievement, she 
talks of private-school effects (C. Lubienski & S. T. Lubienski, 2006). Given this challenge in both 
Wenglinsky’s and Lubienski’s work, one may rightly ask whether it is worthwhile to engage in 
secondary analyses of NAEP at all. The answer is that, while correlational analyses are not good at 
establishing causal relationships they are good at identifying variables that should subsequently be 
subject to more rigorous analyses. This view is the rationale behind the Institute for Education 
Sciences framework for its discretionary grants program; it has a continuum of research goals under 
which researchers can apply, beginning with secondary analyses to identify variables of importance, 
proceeding to developmental studies that make the transition from variable identification to the 
creation of an intervention, finally to experimental studies that can establish causation for 
interventions. The reason that correlational analyses are an important first step is because they 
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suggest where to concentrate and where not to concentrate. Given that, in the Lubienski study, the 
use of calculators proved unrelated to the racial achievement gap, it is unlikely that providing 
calculators to students, alone, is a worthwhile intervention. Thus Lubienski is correct in suggesting 
that cross-sectional data do not support causal inferences, but such secondary analyses are a crucial 
piece of work prior to the development and testing of educational interventions. 

There is perhaps a more important reason that secondary analyses must be viewed as 
preliminary to more rigorous research designs, and that is because secondary analyses nearly always 
fall victim to “statistical reification.” This term refers to the fact that researchers typically treat the 
statistical method of the day as an absolute basis for truth, even though the rationale behind using 
the technique, to say nothing of the mathematics involved, is typically opaque. Why use hierarchical 
linear modeling rather than structural equation modeling? When is multicollinearity severe enough to 
discredit results, given that most of the research questions of interest involve creating a significant 
degree of multicollinearity? Statistical quibbles can be raised about any secondary analysis. 

Lubienski’s is no exception. Although she does not present the correlations among her factors, she 
refers to their being highly correlated. Highly-correlated factors suggest a poorly-fitting factor model 
and therefore potentially invalidate it. In addition, the proper way to verify a factor structure is 
through confirmatory factor analysis using a separate replication sample, not through creating scales 
and running Cronbach Alphas on the same data. And, disturbingly, here Cronbach Alpha’s are 
mostly below what many consider to be the cutoff of .7. One other issue is that her models 
disaggregate teacher effects to the student level, which means that the instructional variables are only 
partitioning student-level variance in achievement, not school-level variance in achievement, and 
thus may understate the size of instructional effects. Does all of the foregoing invalidate her 
conclusions? The reader will probably decide by comparing the findings to his or her own 
experiences with education reform. 

Secondary analyses need to be conducted with limited goals, and some kind of more robust 
research design (quasi-experimental or experimental) used for the more ambitious goal of 
demonstrating the efficacy of an intervention. One reason for this is that experiments are designed 
to support causal inferences, by holding constant all variables besides exposure to the treatment. 
They therefore address selection bias in a way that statistical analyses cannot. But a more important 
reason is the transparent nature of the results of an experiment. The results are transparent because 
they generally involve performing a student’s t-test on two raw scores, that of the treatment and that 
of the control. These kinds of comparisons are more likely to be persuasive to a policymaker or 
educator than the elaborate debates over the appropriate multivariate method, the appropriate fit 
statistic, or any number of other running debates among quantitative methodologists. This is not to 
say that experiments are the “gold standard,” but simply that they are less subject to reification, and 
therefore more trustworthy from the standpoint of making policy decisions. 
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